-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
BUG: DataFrame repr with ArrowDtype with extension #54130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dtype=ArrowDtype(ArrowPeriodType("D")), | ||
) | ||
result = repr(df) | ||
expected = " col\n0 15340\n1 15341\n2 15342" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where does "15340" come from? i'd expect something like "2012-01-01"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is because ArrowPeriodType
stores period data as pa.int64
. Since pyarrow.ExtensionType
is only used for serialization it appears it doesn't go through period reprs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im unclear on if this should be considered "wrong"? or just orthogonal to what this PR is doing? seems like this would be an unhelpful repr to give a user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a orthogonal incorrect behavior. This PR addresses a bug where pyarrow.ExtensionType
s didn't even show a repr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a comment that this test shouldn't be interpreted as saying this repr is "correct"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Added a comment
else: | ||
raise NotImplementedError(pa_type) | ||
elif isinstance(pa_type, pa.ExtensionType): | ||
return type(self)(pa_type.storage_type).type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this going to match the storage_type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the python type of the storage_type
Might also close #54063? |
I'll leave that open for now. I interpreted that issue as maybe there's a fix that could be backported to 2.0.4 (but I am not sure that there is) |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.I think it makes sense for the python
.type
of a pyarrow ExtensionType to be its storage's.type
.